The data here are a re-creation of the 2016 analysis from Vox, which Vox originally modeled off of the Washington State Department of Health’s analysis. They were derived from tract-level Census Bureau data for “poverty status in the last 12 months” (Table S1701) and “year structure built” (Table B25034).
The final product in this dataset is a lead risk score ranging from 1-10, 1 indicating very little lead exposure risk and 10 indicating very high lead exposure risk. These lead risk scores take into account both the age of housing and poverty status, which have both been shown to influence the risk of lead poisoning in children. (There are other factors that influence lead exposure, but it is often hard to get good data on those things.)
First, I imported the poverty and housing data from the US Census Bureau for all 50 states. Then I removed all census tracts that had missing observations for number of households or number of individuals for whom poverty status is determined.
Then I created five categories for housing risk:
age_39, which is the number of households built before 1930, multiplied by 0.68age40_59, which is the number of households built between 1940 and 1959, multiplied by 0.43age60_79, which is the number of households built between 1960 and 1979, multiplied by 0.08age80_99, which is the number of households built between 1980 and 1999, multiplied by 0.03age00_plus, which is the number of households built in 2000 or later, multiplied by 0The numbers that each category is multiplied by indicate the percentage of households in that age range that contain lead paint, as established by Jacobs et al (2002).
I then calculated the housing risk for each census tract by adding together the five age categories and dividing that by the total number of households.
To calculate poverty risk, I divided the number of people under 125% of the poverty line by the total number of people.
Then I standardized the housing and poverty risks by calculating z-scores for them. After standardizing them, I created a weighted lead risk, with housing weighted at 0.58, and poverty weighted at 0.42. Then I added the weighted lead risks to create a combined lead risk. Those combined risks were then split up into deciles to create the lead risk scores from 1-10.
All of this was done at the national level first and then restricted to the Charlottesville area, so all lead risk scores are relative to the rest of the country.
glimpse(leadrisk)
## Rows: 50
## Columns: 19
## $ GEOID <dbl> 51065020200, 51003010201, 51003010202, 510030104…
## $ NAME <chr> "Census Tract 202, Fluvanna County, Virginia", "…
## $ age_39 <dbl> 374.00, 47.60, 87.72, 234.60, 27.20, 17.00, 7.48…
## $ age40_59 <dbl> 132.44, 8.60, 14.19, 102.34, 6.45, 8.17, 3.01, 3…
## $ age60_79 <dbl> 36.24, 36.56, 36.64, 39.52, 30.88, 72.72, 11.60,…
## $ age80_99 <dbl> 18.72, 28.08, 15.39, 17.76, 19.77, 41.82, 26.31,…
## $ age00_plus <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ total <dbl> 2297, 1995, 1359, 2137, 1512, 2812, 2000, 4186, …
## $ sum_housing_risk <dbl> 561.40, 120.84, 153.94, 394.22, 84.30, 139.71, 4…
## $ housing_risk <dbl> 24.440575, 6.057143, 11.327447, 18.447356, 5.575…
## $ total_poverty_status <dbl> 4804, 4790, 3093, 4167, 3000, 6095, 4297, 7017, …
## $ total_under_125pct <dbl> 439, 114, 149, 388, 103, 1176, 452, 901, 1166, 8…
## $ poverty_risk <dbl> 9.138218, 2.379958, 4.817329, 9.311255, 3.433333…
## $ z_housing_risk <dbl> 0.2870704, -0.9607426, -0.6030100, -0.1197315, -…
## $ z_poverty_risk <dbl> -0.734216628, -1.228102854, -1.049982438, -0.721…
## $ weighted_housing_risk <dbl> 0.16650086, -0.55723072, -0.34974578, -0.0694442…
## $ weighted_poverty_risk <dbl> -0.308370984, -0.515803199, -0.440992624, -0.303…
## $ leadriskscore_raw <dbl> -0.14187013, -1.07303392, -0.79073840, -0.372504…
## $ lead_risk_rank <dbl> 5, 1, 2, 4, 1, 3, 1, 2, 2, 3, 5, 5, 8, 10, 6, 5,…
leadrisk %>% select(-c(GEOID:NAME)) %>%
select(where(~is.numeric(.x) && !is.na(.x))) %>%
as.data.frame() %>%
stargazer(., type = "text", title = "Summary Statistics", digits = 0,
summary.stat = c("mean", "sd", "min", "median", "max"))
##
## Summary Statistics
## =======================================================
## Statistic Mean St. Dev. Min Median Max
## -------------------------------------------------------
## age_39 131 123 0 89.1 496
## age40_59 89 80 0 69 409
## age60_79 40 24 3 35 139
## age80_99 23 16 2 21 78
## age00_plus 0 0 0 0 0
## total 2,260 984 148 2,128 4,626
## sum_housing_risk 283 171 4 253 728
## housing_risk 13 8 2 11 33
## total_poverty_status 4,785 2,089 287 4,731.5 10,421
## total_under_125pct 731 558 103 637 3,289
## poverty_risk 18 17 2 13 95
## z_housing_risk -0 1 -1 -1 1
## z_poverty_risk -0 1 -1 -0 6
## weighted_housing_risk -0 0 -1 -0 0
## weighted_poverty_risk -0 1 -1 -0 2
## leadriskscore_raw -0 1 -1 -0 2
## lead_risk_rank 4 3 1 4 10
## -------------------------------------------------------
leadrisk %>%
ggplot() +
geom_histogram(aes(x=lead_risk_rank)) +
scale_x_continuous("Lead risk rank", breaks=c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), labels=c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10))
leadrisk %>%
ggplot() +
geom_point(aes(x=housing_risk, y=poverty_risk)) +
xlim(0, 100) +
ylim(0, 100)
cville_shapes <- readRDS("../cville_region_collection/data/cville_tracts.RDS")
leadrisk <- merge(cville_shapes, leadrisk, by = 'GEOID', all.x = TRUE)
leadrisk <- st_transform(leadrisk, crs = 4326)
pal <- colorFactor("viridis", reverse = TRUE, domain = leadrisk$lead_risk_rank)
leaflet(leadrisk) %>%
addProviderTiles("CartoDB.Positron") %>%
addPolygons(data = leadrisk,
fillColor = ~pal(lead_risk_rank),
weight = 1,
opacity = 1,
color = "white",
fillOpacity = 0.6,
highlight = highlightOptions(weight = 2, fillOpacity = 0.8, bringToFront = T),
popup = paste0(leadrisk$NAME.y, "<br>",
"Lead Risk Rank: ", leadrisk$lead_risk_rank)) %>%
addLegend("bottomright", pal = pal, values = leadrisk$lead_risk_rank,
title = "Lead Risk Rank", opacity = 0.7)
pal <- colorNumeric("viridis", reverse = TRUE, domain = leadrisk$housing_risk)
leaflet(leadrisk) %>%
addProviderTiles("CartoDB.Positron") %>%
addPolygons(data = leadrisk,
fillColor = ~pal(housing_risk),
weight = 1,
opacity = 1,
color = "white",
fillOpacity = 0.6,
highlight = highlightOptions(weight = 2, fillOpacity = 0.8, bringToFront = T),
popup = paste0(leadrisk$NAME.y, "<br>",
"Housing Risk: ", round(leadrisk$housing_risk, 2))) %>%
addLegend("bottomright", pal = pal, values = leadrisk$housing_risk,
title = "Housing Risk", opacity = 0.7)
pal <- colorNumeric("viridis", reverse = TRUE, domain = leadrisk$poverty_risk)
leaflet(leadrisk) %>%
addProviderTiles("CartoDB.Positron") %>%
addPolygons(data = leadrisk,
fillColor = ~pal(poverty_risk),
weight = 1,
opacity = 1,
color = "white",
fillOpacity = 0.6,
highlight = highlightOptions(weight = 2, fillOpacity = 0.8, bringToFront = T),
popup = paste0(leadrisk$NAME.y, "<br>",
"Poverty Risk: ", round(leadrisk$poverty_risk, 2))) %>%
addLegend("bottomright", pal = pal, values = leadrisk$poverty_risk,
title = "Poverty Risk", opacity = 0.7)
pal <- colorNumeric("viridis", reverse = TRUE, domain = leadrisk$leadriskscore_raw)
leaflet(leadrisk) %>%
addProviderTiles("CartoDB.Positron") %>%
addPolygons(data = leadrisk,
fillColor = ~pal(leadriskscore_raw),
weight = 1,
opacity = 1,
color = "white",
fillOpacity = 0.6,
highlight = highlightOptions(weight = 2, fillOpacity = 0.8, bringToFront = T),
popup = paste0(leadrisk$NAME.y, "<br>",
"Raw Lead Risk Score: ", round(leadrisk$leadriskscore_raw, 2))) %>%
addLegend("bottomright", pal = pal, values = leadrisk$leadriskscore_raw,
title = "Raw Lead Risk Score", opacity = 0.7)